Documentation for KWIC.PRG

        Key Words in Context listings have been in use for 
        several years by technical libraries.  
        Essentially, They  provide an abstracting service 
        where no provider of such a service exists.  Key 
        words are extracted from the title of a book or 
        journal article, sorted, and the entire title is 
        printed in its' original context; that is, the 
        full title of the work.  A key word is simply any 
        word that is not on a list of words to be 
        excluded. Excluded words include the articles and 
        most prepositions and conjunctions.  Thus, a 
        person doing research on migraine headaches looks 
        under 'migraine', 'headache', any other synonyms 
        he knows and under the drugs commonly prescribed 
        for this terrible malady. Some of the titles will 
        appear more than once during his search, of 
        course, but once he finishes he knows that he has 
        exhausted the information content of the set of 
        titles.  The program can be used directly from the 
        monitor instead of a hard copy, but the listing 
        offers portability.  Also I find the presence of 
        the computer a distraction, one gets fascinated 
        with the research tool instead of the problem 
        being researched.  (How many ex chemists, 
        biologists, astronomers, ...., are now computer 
        programmers?)

        
                         Getting started
        If you have a reasonably standard printer that is 
        powered up and ready  to print, you can 
        demonstrate the program simply by executing it and 
        giving it the default response (a carriage return) 
        every time a question is asked.  Sample files are 
        included to be used as a base for a built in 
        demonstration.  
        
                      How I use this program
                I collect cookbooks.  I have no data base 
        program to assist me.  There are several sub 
        categories to the collection. That is paper backs, 
        unusually prolific authors, books that are 
        encyclopedic in nature,  books too big for 
        ordinary shelves, and so on.  The sample file 
        included here is one of the sub collections.  The 
        main file of rather ordinary hard bound books is 
        too large to include in an upload. The main file 
        starts with the book title in column 1 and has the 
        authors' name starting in column 40. (Note that 
        the sample file included here is a _sub_ 
        collection and does not have the author's name 
        field.)  The file is sorted by author name and 
        the books are shelved in that order too.  Sorting 
        is done with a public domain sort program.  Each 
        book is represented by a one line entry.  This is 
        necessary because the sort program I use demands a 
        _consistent_ record length.  That is, a record is 
        zero or more characters followed by a CR and LF.  
        
        To make a KWIC listing, I specify the particular 
        file of interest when 'input file' is called for.  
        I often make composite listings including several 
        sub-categories.  To make a composite file, I find 
        the Public Domain program PCOMMA (aka PCOMMAND), 
        which emulates PC-DOS invaluable.  I have several 
        .BATch files which, when run, join up the individual 
        files in various ways and produce the desired 
        composite file. 
        
        When the file of 'bad words' is called for I use a 
        personalized file.  Some words are used so often 
        _within_ a specialty that they become simply 
        'noise words'.  In cookbooks, such words as 
        'cook', book' 'cookbook', and 'recipe' come up so 
        often as to be meaningless. When the program asks 
        for columns to be ignored, I specify column 39.  
        This means that the authors' name is not a 
        keyword. After all, I already have a listing 
        sorted by authors' name.  I also sometimes have 
        notes beyond column 39 and I want them ignored 
        too, as far as key words are concerned. 
        
        When KWIC asks for the leftmost column for the 
        keyword, I specify column 60.  I then specify an 
        Epson printer, with printing to be 137 columns 
        wide.  So I end up with a hard copy with key words 
        nicely aligned on column 60 and the authors' name 
        is on the same line (but not aligned properly, 
        unfortunately) so I can find the book physically 
        without referring to another index.  That's how I 
        use it. Now on to the general nature of the beast.

        
                          The Input file
        The input file is prepared with your favorite text 
        editor or a word processor in ASCII mode.  It is a 
        list of book titles, journal articles, or any 
        analogous item.  An entry normally starts in 
        column 1 and can be as long as desired, within 
        reason.  The program will work best however, with 
        relatively short entries, say 80 characters or 
        less. Normal practice will result in most key 
        words having the initial letter in upper case.  
        The program will find them regardless of 
        upper/lower problems.  But after they are found 
        they are sorted following the collating order of 
        ASCII.  That means that 'a' follows 'Z'.  Numbers 
        will be found as key words too.  Sorting puts all 
        digits ahead of all letters.  Blank lines in the 
        input file will be ignored.  
        
        
        
                    The bad word file
        
        You can make your own personalized bad word file 
        by modifying the file included with the .ARC.  It 
        is a simple text file, too.  The word must start 
        in column 1 and be followed _immediately_ by 
        [Return].  That is, 'apple' is not the same as 
        'apple '.  These words should all be lower case.  
        You can enter the words in any order that occurs 
        to you; the program will automatically do a simple  
        resort of the bad word file every time it runs.    
        You can have as many bad word files as you wish.  
        The file included is specialized for cookbooks.  
        About the first 80 entries would apply to any English 
        title, simply remove the specialized words and 
        replace them with your own set.  
        
        
                          Program Output
        After the program has run and extracted all the 
        key words and sorted them, it is ready to provide 
        output.  Since the program may run for several 
        minutes, it allows you to get several 
        outputs from a single run of the program.  The 
        monitor choice is mostly offered as a preview to 
        get an idea of whether things turned out OK.  
        Since it is limited to 80 columns width, it is not 
        very effective for long records.  The basic output 
        will often be an Epson printer with the 137 
        column line choice.  This permits you to align the 
        key words at, say column 60 and get a nice looking 
        output with reasonably sized titles.
        
                       Non-Epson printers
         If you have a non Epson printer, there are two 
        alternatives. The first alternative is to set up 
        the printer to produce some kind of compressed 
        printing _before_ you run KWIC.  KWIC will not 
        send anything except data to the printer if you 
        specify non Epson.  You can also use this approach 
        if you want more than 137 columns on an Epson 
        printer, the printer can easily go to 160 columns 
        and can even be pushed to exceed that.
        
        The other alternative is to specify output to a 
        file.  This will be an ordinary ASCII file which 
        you can read into a text editor, perhaps do 
        further editing, and output the same way you would 
        any other text file.  One word about writing to a 
        file.  The program uses the default Personal 
        Pascal text file write and it performs an 
        incredible amount of slow activity on the disk.  
        If you have a nervous temperament, as I do, and 
        you see hundreds of writes to your disk, you may 
        get very tense.  The program works fine, but if 
        this bothers you, write to a blank floppy disk 
        (making things even slower!) and then copy that 
        file to your hard disk.  I could have speeded this 
        write up, but considering the nature of the 
        program, it just didn't seem worth the effort.  
        Note also, that the file produced can easily be quite 
        large, one that I commonly produce is in excess of
        200,000 bytes.

        
                          Loose ends
        The program allows input files to be up to 110,000 
        bytes long and to have up to 8,000 key words.  Normal 
        printing would produce up to a 140 page listing.  
        One of these sizes may be too small for your 
        situation, or the ratio (the number of key words 
        per title) may be wrong for you.  These numbers 
        were chosen to allow the program to run in a 
        system that has about 250K bytes of free RAM.  If 
        you want a customized version, send me E-Mail on 
        GEnie and I can probably make a special version 
        for you to fit your needs.

        The program doesn't care what file names or file 
        name extensions you use; the names provided on the 
        file selectors are merely suggestions.

        For those interested in Pascal, note that the sort 
        program included can be used as a debugged 
        Quicksort.  To customize it, simply change the 
        type declarations and the SWAP procedure.  The 
        base routine is fast, it is so slow as used here 
        because it uses Pascal string logic to copare two 
        11 character strings.  This could easily be 
        speeded up, but considering the nature of this 
        program, it didn't seem worthwhile.  The procedure 
        that reads a file of arbitrary length into an 
        ARRAY of characters might also be useful, it seems 
        that so many programs start out (or should start 
        out) by doing just that.
          
        This program may be freely copied, uploaded, and 
        propogated by any suitable means as long as the
        content passed on includes _all_ the files contained 
        in the original ARChive.   Additionally, the name 
        should remain KWIC.ARC unless the target system
        already has that name in use.  Placed in public domain 
        July 1991.
        
                         Merlin Hanson
                    GEnie address: M.L.HANSON